Exploratory Data Analysis

Code
import folium
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.colors import rgb2hex
import numpy as np

dataset_gpd = gpd.read_file(
    "https://gitlab.com/drvicsana/opt-milp-project-2025/-/raw/main/datasets/dataset.geojson"
)

Available Data

The dataset contains the following attributes for each cell in the discretized map of Menorca:

Attribute Description Domain
grid_id Identifier of a cell in Menorca’s map (column and row number) String
dominant_land_cover_name Dominant type of land in the cell String
cost_adaptation_atelerix Cost of making some adaptations in the cell for the atelerix Float
cost_adaptation_martes Cost of making some adaptations in the cell for the martes martes Float
cost_adaptation_eliomys Cost of making some adaptations in the cell for the eliomys quercinus Float
cost_adaptation_oryctolagus Cost of making some adaptations in the cell for the oryctolagus cuniculus Float
cost_corridor Cost of preparing a single corridor in the cell Float
has_atelerix_algirus Whether or not the cell has a living colony of atelerix Boolean
has_martes_martes Whether or not the cell has a living colony of martes martes Boolean
has_eliomys_quercinus Whether or not the cell has a living colony of eliomys quercinus Boolean
has_oryctolagus_cuniculus Whether or not the cell has a living colony of european rabbit Boolean
geometry The polygon that describes the cell Geometry

Corridor Cost Analysis

We observe that the distribution of corridor costs is normally distributed.

plt.figure(figsize=(10, 6))
plt.hist(dataset_gpd["cost_corridor"], bins=25, edgecolor="black")
plt.xlabel("Cost of Corridor")
plt.ylabel("Frequency")
plt.title("Distribution of Cost of corridor")
plt.show()

Adaptation Analysis

Here is analysed the adaptation for the four species considered in the project: Oryctolagus cuniculus, Eliomys quercinus, Martes martes and Atelerix algirus.

Sustainability Scores

Firstly, is important to assign a suitability score to each land cover type for each species. These scores are based on ecological studies and expert opinions regarding the habitat preferences of each species.

Code
suitability_cuniculus = {
    "Discontinuous Urban Fabric": "Low",
    "Continuous Urban Fabric": "Very Low",
    "Industrial or Commercial Units": "Very Low",
    "Airports": "Very Low",
    "Port Areas": "Very Low",
    "Sport and Leisure Facilities": "Very Low",
    "Pastures": "High",
    "Non-irrigated Arable Land": "High",
    "Permanently Irrigated Land": "Low",
    "Complex Cultivation Patterns": "High",
    "Land Principally Occupied by Agriculture with Significant Areas of Natural Vegetation": "High",
    "Sclerophyllous Vegetation": "High",
    "Transitional Woodland-Shrub": "High",
    "Natural Grasslands": "High",
    "Broad-leaved Forests": "Moderate",
    "Mixed Forests": "Moderate",
    "Coniferous Forests": "Low-Moderate",
    "Peatbogs": "Very Low",
    "Inland Marshes": "Very Low",
    "Coastal Lagoons": "Very Low",
    "Estuaries": "Very Low",
    "Intertidal Flats": "Very Low",
    "Water Courses": "Low",
}

suitability_quercinus = {
    "Discontinuous Urban Fabric": "Low",
    "Continuous Urban Fabric": "Very Low",
    "Industrial or Commercial Units": "Very Low",
    "Airports": "Very Low",
    "Port Areas": "Very Low",
    "Sport and Leisure Facilities": "Very Low",
    "Pastures": "Low",
    "Non-irrigated Arable Land": "Low",
    "Permanently Irrigated Land": "Very Low",
    "Complex Cultivation Patterns": "Moderate",
    "Land Principally Occupied by Agriculture with Significant Areas of Natural Vegetation": "Moderate-High",
    "Sclerophyllous Vegetation": "High",
    "Transitional Woodland-Shrub": "High",
    "Natural Grasslands": "Low",
    "Broad-leaved Forests": "High",
    "Mixed Forests": "High",
    "Coniferous Forests": "Moderate-High",
    "Peatbogs": "Very Low",
    "Inland Marshes": "Very Low",
    "Coastal Lagoons": "Very Low",
    "Estuaries": "Very Low",
    "Intertidal Flats": "Very Low",
    "Water Courses": "Low-Moderate",
}

suitability_martes = {
    "Discontinuous Urban Fabric": "Low",
    "Continuous Urban Fabric": "Very Low",
    "Industrial or Commercial Units": "Very Low",
    "Airports": "Very Low",
    "Port Areas": "Very Low",
    "Sport and Leisure Facilities": "Very Low",
    "Pastures": "Low",
    "Non-irrigated Arable Land": "Low",
    "Permanently Irrigated Land": "Very Low",
    "Complex Cultivation Patterns": "Moderate",
    "Land Principally Occupied by Agriculture with Significant Areas of Natural Vegetation": "Moderate-High",
    "Sclerophyllous Vegetation": "High",
    "Transitional Woodland-Shrub": "High",
    "Natural Grasslands": "Low-Moderate",
    "Broad-leaved Forests": "High",
    "Mixed Forests": "High",
    "Coniferous Forests": "Moderate-High",
    "Peatbogs": "Very Low",
    "Inland Marshes": "Very Low",
    "Coastal Lagoons": "Very Low",
    "Estuaries": "Very Low",
    "Intertidal Flats": "Very Low",
    "Water Courses": "Low-Moderate",
}

suitability_algirus = {
    "Discontinuous Urban Fabric": "Moderate",
    "Continuous Urban Fabric": "Low",
    "Industrial or Commercial Units": "Very Low",
    "Airports": "Very Low",
    "Port Areas": "Very Low",
    "Sport and Leisure Facilities": "Very Low",
    "Pastures": "High",
    "Non-irrigated Arable Land": "High",
    "Permanently Irrigated Land": "Low",
    "Complex Cultivation Patterns": "High",
    "Land Principally Occupied by Agriculture with Significant Areas of Natural Vegetation": "High",
    "Sclerophyllous Vegetation": "High",
    "Transitional Woodland-Shrub": "High",
    "Natural Grasslands": "Moderate-High",
    "Broad-leaved Forests": "Moderate",
    "Mixed Forests": "Moderate",
    "Coniferous Forests": "Low",
    "Peatbogs": "Very Low",
    "Inland Marshes": "Very Low",
    "Coastal Lagoons": "Very Low",
    "Estuaries": "Very Low",
    "Intertidal Flats": "Very Low",
    "Water Courses": "Low-Moderate",
}

dataset_gpd["suitability_cuniculus"] = dataset_gpd["dominant_land_cover_name"].map(
    suitability_cuniculus
)
dataset_gpd["suitability_quercinus"] = dataset_gpd["dominant_land_cover_name"].map(
    suitability_quercinus
)
dataset_gpd["suitability_martes"] = dataset_gpd["dominant_land_cover_name"].map(
    suitability_martes
)
dataset_gpd["suitability_algirus"] = dataset_gpd["dominant_land_cover_name"].map(
    suitability_algirus
)

Observing the suitability assignment, there are some cells that have a land cover type that is not considered in the suitability mapping, resulting in NaN values for their suitability scores. These cells need to be handled appropriately before proceeding with further analysis. In this case, these cells will be dropped from the dataset to ensure that all remaining cells have valid suitability scores for all species.

dataset_gpd[dataset_gpd["suitability_cuniculus"].isnull()]
grid_id cell_area_km2 dominant_land_cover_name cost_adaptation_atelerix cost_adaptation_martes cost_adaptation_eliomys cost_adaptation_oryctolagus cost_corridor has_atelerix_algirus has_martes_martes has_eliomys_quercinus has_oryctolagus_cuniculus geometry suitability_cuniculus suitability_quercinus suitability_martes suitability_algirus
510 cell_26_39 0.000360 Unknown 5.32 5.50 5.83 8.41 1.95 False False False False MULTIPOLYGON (((4.02684 40.06534, 4.02693 40.0... NaN NaN NaN NaN
630 cell_31_43 0.001558 Unknown 2.97 4.19 5.71 7.16 2.28 False False False False POLYGON ((4.07326 40.09201, 4.07318 40.09206, ... NaN NaN NaN NaN
659 cell_32_43 0.059836 Unknown 4.83 5.06 4.60 5.37 1.23 False False False False POLYGON ((4.07337 40.09195, 4.07326 40.09201, ... NaN NaN NaN NaN
750 cell_35_38 0.000178 Unknown 3.89 6.36 6.25 6.04 2.01 False False False False POLYGON ((4.09911 40.05792, 4.09911 40.05807, ... NaN NaN NaN NaN
838 cell_38_39 0.002063 Unknown 6.18 4.72 3.23 3.99 2.98 False False False False POLYGON ((4.13082 40.06437, 4.1308 40.06439, 4... NaN NaN NaN NaN
1116 cell_47_32 0.094412 Unknown 7.88 3.41 5.06 4.81 1.85 False False False False MULTIPOLYGON (((4.20868 40.01629, 4.20876 40.0... NaN NaN NaN NaN
1145 cell_48_31 0.001918 Unknown 5.02 6.21 6.69 5.42 2.38 False False False False POLYGON ((4.21266 40.01469, 4.21274 40.01469, ... NaN NaN NaN NaN
1146 cell_48_32 0.000136 Unknown 3.68 4.37 6.10 5.90 1.48 False False False False POLYGON ((4.21272 40.01825, 4.21283 40.01813, ... NaN NaN NaN NaN

For the sustainability visualization, it is convenient to convert the categorical suitability scores into numerical values for easier mapping and visualization. The following mapping is used:

Suitability Score Numerical Value
Very Low 1
Low 2
Low-Moderate 3
Moderate 4
Moderate-High 5
High 6
Code
dataset_gpd = dataset_gpd.dropna().copy()

suitability_to_num = {
    "Very Low": 1,
    "Low": 2,
    "Low-Moderate": 3,
    "Moderate": 4,
    "Moderate-High": 5,
    "High": 6,
}

dataset_gpd["suitability_cuniculus_num"] = (
    dataset_gpd["suitability_cuniculus"].map(suitability_to_num).astype("int8")
)
dataset_gpd["suitability_quercinus_num"] = (
    dataset_gpd["suitability_quercinus"].map(suitability_to_num).astype("int8")
)
dataset_gpd["suitability_martes_num"] = (
    dataset_gpd["suitability_martes"].map(suitability_to_num).astype("int8")
)
dataset_gpd["suitability_algirus_num"] = (
    dataset_gpd["suitability_algirus"].map(suitability_to_num).astype("int8")
)

suitability_colors = {
    1: "red",
    2: "orange",
    3: "yellow",
    4: "lightgreen",
    5: "green",
    6: "darkgreen",
}


def build_species_map(
    gdf, value_col, species_label, center=(39.97, 4.0460), zoom=10, width="60%"
):
    m = folium.Map(location=center, zoom_start=zoom, tiles="OpenStreetMap", width=width)

    for _, row in gdf.iterrows():
        val = row[value_col]
        color = suitability_colors.get(val, "gray")
        land_cover_name = row["dominant_land_cover_name"]

        folium.GeoJson(
            row.geometry,
            style_function=lambda x, color=color: {
                "fillColor": color,
                "color": "black",
                "weight": 0.5,
                "fillOpacity": 0.7,
            },
            tooltip=(
                f"Grid ID: {row['grid_id']}<br>"
                f"Dominant Land Cover: {land_cover_name}<br>"
                f"Suitability ({species_label}): {val if pd.notna(val) else 'Unknown'}"
            ),
        ).add_to(m)

    return m


map_cuniculus = build_species_map(
    dataset_gpd, "suitability_cuniculus_num", "O. cuniculus"
)
map_quercinus = build_species_map(
    dataset_gpd, "suitability_quercinus_num", "E. quercinus"
)
map_martes = build_species_map(dataset_gpd, "suitability_martes_num", "M. martes")
map_algirus = build_species_map(dataset_gpd, "suitability_algirus_num", "A. algirus")

map_cuniculus
Make this Notebook Trusted to load map: File -> Trust Notebook

Suitability Maps for O. cuniculus

Code
map_algirus
Make this Notebook Trusted to load map: File -> Trust Notebook

Suitability Maps for A. algirus

Code
map_quercinus
Make this Notebook Trusted to load map: File -> Trust Notebook

Suitability Maps for E. quercinus

Code
map_martes
Make this Notebook Trusted to load map: File -> Trust Notebook

Suitability Maps for M. martes

Visually inspecting the maps, it can be noted a certain relation between the species with respect to the suitability scores. The Oryctolagus cuniculus and Atelerix algirus seem to have a similar distribution of suitability scores across the island, with many areas showing high suitability for both species. On the other hand, the Eliomys quercinus and Martes martes also exhibit a similar pattern to each other, but their suitability distributions differ from the first two species. This suggests that the habitat preferences of these species may overlap to some extent, but there are also distinct differences in their ecological requirements. An hypothesis could be that the Martes martes would have the same habitat preferences as the Eliomys quercinus due to their predator-prey relationship.

Benefit Estimation

In order to estimate the benefit, is important to analyse the adaptation costs, while it is necessary to make a value in the same scale. So the first step is to observe the distribution of adaptation costs.

adaptation_cost_columns = [
    "cost_adaptation_atelerix",
    "cost_adaptation_martes",
    "cost_adaptation_eliomys",
    "cost_adaptation_oryctolagus",
]
dataset_gpd[adaptation_cost_columns].hist(bins=50, figsize=(15, 10), grid=False)
pass

costes = dataset_gpd[dataset_gpd["cost_adaptation_oryctolagus"] < 30][
    "cost_adaptation_oryctolagus"
]
costes.hist(bins=100, figsize=(12, 8), grid=False)
pass

Code
costes = dataset_gpd[adaptation_cost_columns].to_numpy().flatten()

Observing the distibution plots, it seems that, for all the species, the costs follow a mixture of two distributions: a large number of low values and a small number of very high values. This suggests that most cells have relatively low adaptation costs, while a few cells are significantly more expensive to adapt. This pattern could reflect the varying suitability of different land types for each species, with some areas requiring minimal modifications and others needing substantial changes to become viable habitats.

It is also important to note that, the oryctolagus cuniculus, presents really high adaptation costs in comparison to the other species in some parcels.

For these reasons, the benefit, will be calculated as the suitability score multiplied by a cost multiplier based on the 75th percentile of adaptation costs, a value of 6.82. The goal is to get a good trade-off between suitability and cost, prioritizing areas that offer high suitability at a reasonable adaptation cost.

Here is the table that shows the mapping from suitability scores to benefit multipliers:

Suitability Score Benefit Multiplier
Very Low 0.1
Low 0.3
Low-Moderate 0.5
Moderate 0.9
Moderate-High 1.4
High 2
Code
suitability_to_benefit = {
    "Very Low": 0.1,
    "Low": 0.3,
    "Low-Moderate": 0.5,
    "Moderate": 0.9,
    "Moderate-High": 1.4,
    "High": 2,
}

multiplier = np.quantile(costes, 0.75)


dataset_gpd["suitability_cuniculus_benefit"] = dataset_gpd["suitability_cuniculus"].map(
    suitability_to_benefit
)
dataset_gpd["suitability_quercinus_benefit"] = dataset_gpd["suitability_quercinus"].map(
    suitability_to_benefit
)
dataset_gpd["suitability_martes_benefit"] = dataset_gpd["suitability_martes"].map(
    suitability_to_benefit
)
dataset_gpd["suitability_algirus_benefit"] = dataset_gpd["suitability_algirus"].map(
    suitability_to_benefit
)

dataset_gpd["cuniculus_benefit"] = (
    dataset_gpd["suitability_cuniculus_benefit"] * multiplier
)
dataset_gpd["quercinus_benefit"] = (
    dataset_gpd["suitability_quercinus_benefit"] * multiplier
)
dataset_gpd["martes_benefit"] = dataset_gpd["suitability_martes_benefit"] * multiplier
dataset_gpd["algirus_benefit"] = dataset_gpd["suitability_algirus_benefit"] * multiplier
dataset_gpd.head()
grid_id cell_area_km2 dominant_land_cover_name cost_adaptation_atelerix cost_adaptation_martes cost_adaptation_eliomys cost_adaptation_oryctolagus cost_corridor has_atelerix_algirus has_martes_martes ... suitability_martes_num suitability_algirus_num suitability_cuniculus_benefit suitability_quercinus_benefit suitability_martes_benefit suitability_algirus_benefit cuniculus_benefit quercinus_benefit martes_benefit algirus_benefit
0 cell_0_28 0.032397 Discontinuous Urban Fabric 4.50 6.37 7.00 6.34 2.04 False False ... 2 4 0.3 0.3 0.3 0.9 2.046 2.046 2.046 6.138
1 cell_0_29 0.257537 Discontinuous Urban Fabric 3.86 9.50 7.25 6.91 1.90 True False ... 2 4 0.3 0.3 0.3 0.9 2.046 2.046 2.046 6.138
2 cell_0_30 0.280383 Pastures 2.39 8.42 8.93 2.31 2.35 False False ... 2 6 2.0 0.3 0.3 2.0 13.640 2.046 2.046 13.640
3 cell_0_31 0.481008 Pastures 2.91 5.74 8.85 2.74 2.23 False False ... 2 6 2.0 0.3 0.3 2.0 13.640 2.046 2.046 13.640
4 cell_0_32 0.040811 Pastures 1.86 7.13 10.03 2.06 0.88 False False ... 2 6 2.0 0.3 0.3 2.0 13.640 2.046 2.046 13.640

5 rows × 29 columns